Skip to content

Parse files in parallel when possible#21175

Open
ilevkivskyi wants to merge 3 commits intopython:masterfrom
ilevkivskyi:parallel-native-parse
Open

Parse files in parallel when possible#21175
ilevkivskyi wants to merge 3 commits intopython:masterfrom
ilevkivskyi:parallel-native-parse

Conversation

@ilevkivskyi
Copy link
Copy Markdown
Member

The idea is simple: new parser doesn't need the GIL, so we can parse files in parallel. Not sure why, but the most I see is ~4-5x speed-up with 8 threads, if I add more threads, it doesn't get visibly faster (I have 16 physical cores).

Some notes on implementation:

  • I use stdlib ThreadPoolExecutor, it seems to work OK.
  • I refactored parse_file() a bit, so that we can parallelize (mostly) just the actual parsing. I see measurable degradation if I try to parallelize all of parse_file().
  • I do not use psutil because it is an optional dependency. We may want to actually make it a required dependency at some point.
  • It looks like there is a weird mypyc bug, that causes ast_serialize to be None sometimes in some threads. I simply add an ugly workaround for now.
  • I only implement parallelization in the coordinator process. The workers counterpart can be done after Split type-checking into interface and implementation in parallel workers #21119 is merged (it will be trivial).

cc @JukkaL

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant